Residual diagnostic
Fitted Values :- 1. Each observation in a time series can be forecast using all previous observations. These values are called Fitted Values. 2. Fitted Values are often not the actual forecasts because any parameters involved in the forecasting method are estimated using all the available observation in the time series, including future observations. For e.g. Average Method and Drift Method. But in case of naive and seasonal naive, the forecast do not involve any parameter so the fitted values are actual forecasts.
Residual :- Residual in a time series model is what is left over after fitting a model. \[e_t = y_t - fitted(y_t)\] If a model fit the data accurately then,
But there is a possobility that more than one forecasting methods satisfy above mentioned properties. So these properties can not be used to select a forecasting method.
Another 2 useful (not necessary) properties to check are 1. The residuals have a constant variance. 2. The residuals are normally distributed.
The above 2 properties makes the calculation for prediction interval more easy otherwise they do not cause any problem in the forecasting.
Now we will see an example of residual analysis.
library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth
autoplot(goog200) +
xlab("Day") + ylab("closing prices(in $") +
ggtitle("Google Stock(daily ending 6 December 2013")
For stock prices, naive method is often the best method.
autoplot(residuals(naive(goog200))) +
xlab("Day") + ylab("") +
ggtitle("Residual from the Naive Method")
The graph above shows that mean of the residuals is close to zero. Also, the variation of the residuals stays same across the historical data, apart from one outlier.
gghistogram(residuals(naive(goog200))) + ggtitle("Histogram of the residuals")
## Warning: Removed 1 rows containing non-finite values (stat_bin).
The histogram shows that the residuals are not normally distributed. There are some extreme observations on the right (positively skewed). The prediction interval calculated assuming the normal distribution may be inaccurate.
ggAcf(residuals(naive(goog200))) + ggtitle("ACF of residuals")
The autocorrelation is not significant for any lag.
From above graph it appears that naive method forecasts accounts for all available information.
Portmanteau tests for autocorrelation :- One issue with ACF plots is that we make an hypothesis for each of the lag and each test has some probability of giving a false result. If the value of h is large, then it is likely that atleast one of them gives wrong result. To overcome this we are going to test whether the first h autocorrelations are significantly different from what is expected from a white noise process.
Box - Pierce test :- \[ Q = T \sum_{k=1}^{h} r_{k}^{2}\ ,\] where h = maximum lag being considered.(usually h = 10 for non seasonal data, h = 2m for seasonal data with seasonal frequency m). The test is still not good if h is too large. So if the values are larger than T/5 use h = T/5.
T = no. of observations.
Ljung-Box test :- \[Q = T(T+2) \sum_{k=1}^{h}(T-k)^{-1} r_{k}^{2}\] If the autocorrelation comes with white noise then both Q follows chi square distribution with (h-K) degrees of freedom where K is the number of parameters in the model.
Box.test(residuals(naive(goog200)), lag = 10, fitdf = 0)
##
## Box-Pierce test
##
## data: residuals(naive(goog200))
## X-squared = 10.611, df = 10, p-value = 0.3886
Box.test(residuals(naive(goog200)), lag = 10, fitdf = 0, type = "Lj")
##
## Box-Ljung test
##
## data: residuals(naive(goog200))
## X-squared = 11.031, df = 10, p-value = 0.3551
p values are quite high that means that residuals are not distinguishable from a white noise process.
All of the above analysis can be done using just one function.
checkresiduals(naive(goog200))
##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 11.031, df = 10, p-value = 0.3551
##
## Model df: 0. Total lags used: 10
Forecast accuracy
The size of the residuals is not a reliable indication of how large true forecast errors are likely to be. The accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model.
We divide the dataset in training data (80% usually) and test set (20% data). Forecast error = \(Actual(y_{T+h}) - forc(y_{T+h})\). Forecast error are different from residuals as the residuals are calculated on the training dataset.
Mean absolute error : MAE :- mean(|\(e_t\)|). Root squared mean error : RMSE :- \(\sqrt{mean(e_t^2)}\).
A method that minimize MAE leads to forecast of the median and a method that minimise RMSE leads to the forecast of mean.
The issue with MAE and RMSE is that the results are on same scale as of the data. So we cannot compare the accurancy of methods that are based on datasets of different units.
Mean absolute percentage error : MAPE :- mean(|\(p_t\)|). where \(p_t = 100e_t/y_t\).
Disadvantge of using percentage error :- 1. Percentages are infinite or undefined for \(y_t\) = 0 and having extreme values for \(y_t\) close to zero. 2. Percentage errors make sense only when the measurement is on interval scale not on the ratio scale because percentage errors assume that the measurement scale has a meaningful zero. 3. Percentage error ususally put high penalty on the negative error than on positive error as there is always more possibility of making a positive error than making a negetive error.
To solve these issues we might use symmetric MAPE(sMAPE) sMAPE :- mean( 200 * \(|y_t - \hat{y}_t|/(y_t + \hat{y}_t)\) ) This still do not solve the issue mentioned in (1) above.
Scaled Error :- To compare the forecast from different time series, we can use scaled error. For the non-seasonal data scaled error can be given as :- \[q_j = \frac{e_j}{\frac{1}{T-1} \sum_{t=2}^{T} |y_t - \hat{y}_t|}\] The denominator is measuring the error from a naive forecast method. 1. \(q_j\) is independent of units. 2. \(q_j\) is less than 1 if the forecast is better than the average naive forecast and greater than 1 is the forecast is worse than average naive forecast.
For the seasonal data scaled error can be given as :- \[q_j = \frac{e_j}{\frac{1}{T-m} \sum_{t=m+1}^{T} |y_t - \hat{y}_{t-m}|}\] Mean absolute scaled error (MASE) = mean(|\(q_t\)|).
We will see an example of this on Quarterly Beer Production data.
beer2 = window(ausbeer, start = 1992, end = c(2007,4))
beerfit1 = meanf(beer2, h = 10)
beerfit2 = rwf(beer2, h = 10)
beerfit3 = snaive(beer2, h = 10)
autoplot(window(ausbeer, start = 1992)) +
autolayer(beerfit1, series = "Mean", PI = FALSE) +
autolayer(beerfit2, series = "Naive", PI = FALSE) +
autolayer(beerfit3, series = "Seasonal Naive", PI = FALSE) +
ggtitle("Quarterly beer production data") +
xlab("Years") + ylab("Megalitres") +
guides(colour = guide_legend(title = "Forecast"))
The above graph shows that the seasonal naive fits the data more accurately. We will prove the same using error measures.
beer3 = window(ausbeer, start = 2008)
accuracy(beerfit1, beer3)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.000 43.62858 35.23438 -0.9365102 7.886776 2.463942
## Test set -13.775 38.44724 34.82500 -3.9698659 8.283390 2.435315
## ACF1 Theil's U
## Training set -0.10915105 NA
## Test set -0.06905715 0.801254
accuracy(beerfit2, beer3)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.4761905 65.31511 54.73016 -0.9162496 12.16415 3.827284
## Test set -51.4000000 62.69290 57.40000 -12.9549160 14.18442 4.013986
## ACF1 Theil's U
## Training set -0.24098292 NA
## Test set -0.06905715 1.254009
accuracy(beerfit3, beer3)
## ME RMSE MAE MPE MAPE MASE
## Training set -2.133333 16.78193 14.3 -0.5537713 3.313685 1.0000000
## Test set 5.200000 14.31084 13.4 1.1475536 3.168503 0.9370629
## ACF1 Theil's U
## Training set -0.2876333 NA
## Test set 0.1318407 0.298728
The seasonal naive method shows the least test error in all the error measures. Some times different erroe measures might leads to different conclusion about which is the best forecasting method.
Now we will see a non seasonal data example. Google stock price index
googfc1 = meanf(goog200, h = 40)
googfc2 = rwf(goog200, h = 40)
googfc3 = rwf(goog200, drift = TRUE, h = 40)
autoplot(subset(goog, end = 240)) +
autolayer(googfc1, series = "Mean", PI = FALSE) +
autolayer(googfc2, series = "Naive", PI = FALSE) +
autolayer(googfc3, series = "Drift", PI = FALSE) +
ggtitle("Google Daily Stock Price Index") +
xlab("Day") + ylab("Closing Price(in $)") +
guides(colour = guide_legend(title = "Forecast"))
The above graph shows that the drift method is giving the best results.
googtest = window(goog, start = 201, end = 240)
accuracy(googfc1, googtest)
## ME RMSE MAE MPE MAPE
## Training set -4.296286e-15 36.91961 26.86941 -0.6596884 5.95376
## Test set 1.132697e+02 114.21375 113.26971 20.3222979 20.32230
## MASE ACF1 Theil's U
## Training set 7.182995 0.9668981 NA
## Test set 30.280376 0.8104340 13.92142
accuracy(googfc2, googtest)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.6967249 6.208148 3.740697 0.1426616 0.8437137 1.000000
## Test set 24.3677328 28.434837 24.593517 4.3171356 4.3599811 6.574582
## ACF1 Theil's U
## Training set -0.06038617 NA
## Test set 0.81043397 3.451903
accuracy(googfc3, googtest)
## ME RMSE MAE MPE MAPE
## Training set 4.653165e-13 6.168928 3.824406 -0.01570676 0.8630093
## Test set 1.008487e+01 14.077291 11.667241 1.77566103 2.0700918
## MASE ACF1 Theil's U
## Training set 1.022378 -0.06038617 NA
## Test set 3.119002 0.64732736 1.709275
All the error measures shows that drift method gives the best forecast.
Time Series Cross Validation :- In this method we do a series of tests each test contains one test observations and the preceeding that observation is the training set. This way we are not using any future observation for forecast. Initial observations are not used for cross validation as the forecast on a small dataset is not reliable.
e = tsCV(goog200, rwf, drift = TRUE, h = 1)
sqrt(mean(e^2, na.rm = TRUE))
## [1] 6.233245
6.233245 is our average cross validation error.
sqrt(mean(residuals(rwf(goog200, drift = TRUE))^2, na.rm = TRUE))
## [1] 6.168928
RMSE from the residuals is obviously smaller as we are using all the available observations including future observations to forecast.
We can also do a multistep forecast cross validation.
e = tsCV(goog200, forecastfunction = naive, h = 8)
mse = colMeans(e^2, na.rm = TRUE)
data.frame(h = 1:8, MSE = mse) %>%
ggplot(aes(x = h, y = MSE)) + geom_point()
The above shows how the forecast error is increasing as the forecast horizon is increasing.
Excercise :-
5). Australian Beer Data The data is seasonal in nature. So, we are using seasonal naive forecast.
beer <- window(ausbeer, start=1992)
fc <- snaive(beer)
autoplot(fc)
res <- residuals(fc)
autoplot(res)
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 32.269, df = 8, p-value = 8.336e-05
##
## Model df: 0. Total lags used: 8
The residuals looks negatively skewed. The last graph shows that they are not normally distributed.
p-value is very small concludes that residuals are autocorrelated.ACF plot shows a spike at lag 4.
Residuals are scattered around zero but some spikes can be seen on the negetive side. Overall it looks like the model does good but can be improved.
6). Internet Usage per minute.
autoplot(WWWusage) +
ggtitle("Internet Usage per Minute") +
xlab("Minutes") + ylab("No. of Users")
The data shows no seasonality. We are going to use naive method to fit the data.
fc = naive(WWWusage)
res = residuals(fc)
autoplot(res)
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 145.58, df = 10, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 10
Residual plot shows that residuals are not random in nature. There is a possible correlation between the residuals. ACF also indicates the same. Overall the naive is not fitting the data properly.
Quarterly clay brick production :-
autoplot(bricksq) +
ggtitle("Quarterly clay brick production") +
xlab("Quarter")
The data shows some seasonal patterns so we are going to use seasonal naive method.
fc = snaive(bricksq)
res = residuals(fc)
autoplot(res)
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 233.2, df = 8, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 8
Residual plot shows that residuals have increasing variance. Also we are highly overestimating in some cases that results in spikes in the residuals on the negetive side.
ACF plots also shows correlation among the residuals.
Seasonal naive method is not fitting the data properly.
Retail Data :-
retaildata = readxl::read_excel("/Users/atyagi/Desktop/Time Series Forecasting/Time-Series-Forecasting/Time_series_data/retail.xlsx",skip =1)
## readxl works best with a newer version of the tibble package.
## You currently have tibble v1.4.2.
## Falling back to column name repair from tibble <= v1.4.2.
## Message displays once per session.
myts = ts(retaildata[,"A3349873A"], start = c(1984,4), frequency = 12)
Splitting the datasets in train and test.
myts.train = window(myts, end = c(2010,12))
myts.test = window(myts, start = 2011)
autoplot(myts) +
autolayer(myts.train, series = "Training") +
autolayer(myts.test, series = "Test")
The dataset is seasonal in nature, so we will apply sesasonal naive method.
fc = snaive(myts.train)
accuracy(fc, myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 8.866667 19.60898 15.41100 5.2561804 8.116672 1.000000
## Test set -0.237500 25.09626 18.47083 0.1008644 6.074517 1.198548
## ACF1 Theil's U
## Training set 0.7165701 NA
## Test set 0.4798145 0.7159931
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 567.7, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Residual plots shows that mean of the residuals is greater than zero. Also the variance of the residuals is increasinf with time. Residuals are appears to be normally distributed but with a higher peak value. Residuals are also correlated as can be seen in the ACF plot and Q-test.
The accuracy measures are also giving some contradicting results. RMSE and MASE had increased in the test set but on the other hand MAPE has decreased. This is happening because we have more residuals with positive values and as it has been mentioned earlier that percentage error put more penalty on the positive errors, it can give us a misleading results if the test data is not as identical as the train data.
9). Quarterly visitor nights for various regions of Australia:- Total quarterly visitor nights (in millions) from 1998-2016 for twenty regions of Australia within six states. We are going to pick just one region QLDMetro.
autoplot(visnights[,"QLDMetro"])
train1= window(visnights[,"QLDMetro"], end = c(2015,4))
train2= window(visnights[,"QLDMetro"], end = c(2014,4))
train3= window(visnights[,"QLDMetro"], end = c(2013,4))
fc1 = snaive(train1)
accuracy(fc1,window(visnights[,"QLDMetro"], start = 2016, end = c(2016,4)))
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02006107 1.0462821 0.8475553 -0.2237701 7.976760 1.0000000
## Test set 0.56983879 0.9358727 0.7094002 4.6191866 6.159821 0.8369957
## ACF1 Theil's U
## Training set 0.06014484 NA
## Test set 0.09003153 0.4842061
fc2 = snaive(train2)
accuracy(fc2,window(visnights[,"QLDMetro"], start = 2015, end = c(2015,4)))
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01618760 1.0735582 0.8809432 -0.2744747 8.284216 1.0000000
## Test set 0.08203656 0.4117902 0.3133488 0.5875037 3.057463 0.3556969
## ACF1 Theil's U
## Training set 0.06610879 NA
## Test set -0.59903247 0.3336559
fc3 = snaive(train3)
accuracy(fc3,window(visnights[,"QLDMetro"], start = 2014, end = c(2014,4)))
## ME RMSE MAE MPE MAPE MASE
## Training set -0.007455407 1.074544 0.8821694 -0.5625865 8.271365 1.0000000
## Test set 0.370832661 1.058658 0.8625501 4.0472032 8.476977 0.9777602
## ACF1 Theil's U
## Training set 0.07317746 NA
## Test set -0.36890062 1.177346
IBM Stock Prices :-
autoplot(ibmclose) +
ggtitle("IBM Stock Prices") +
ylab("Stock Price")
The data has a downward trend and also a cyclic pattern
train_ibm = window(ibmclose, end = 300)
test_ibm = window(ibmclose, start = 301)
fc1 = meanf(train_ibm)
fc2 = naive(train_ibm)
fc3 = rwf(train_ibm, drift = TRUE)
accuracy(fc1, test_ibm)
## ME RMSE MAE MPE MAPE
## Training set 1.660438e-14 73.61532 58.72231 -2.642058 13.03019
## Test set -1.220933e+02 122.18978 122.09333 -32.083817 32.08382
## MASE ACF1 Theil's U
## Training set 11.52098 0.98957790 NA
## Test set 23.95401 0.05670628 18.78547
accuracy(fc2, test_ibm)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.2809365 7.302815 5.09699 -0.08262872 1.115844 1.000000
## Test set 4.8000000 6.826419 5.40000 1.24443538 1.405293 1.059449
## ACF1 Theil's U
## Training set 0.13510517 NA
## Test set 0.05670628 0.9444433
accuracy(fc3, test_ibm)
## ME RMSE MAE MPE MAPE MASE
## Training set -3.916293e-14 7.297409 5.127996 -0.02530123 1.121650 1.006083
## Test set 6.345151e+00 7.708126 6.551839 1.65200211 1.707415 1.285433
## ACF1 Theil's U
## Training set 0.13510517 NA
## Test set -0.08172572 1.100355
Naive method gives the best test accuracy among all the methods. Drift is performing a little worse because in test set the time series start increasing again which is probably because of a cyclic pattern that can be seen the time plot above.
checkresiduals(fc2)
##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 22.555, df = 10, p-value = 0.01251
##
## Model df: 0. Total lags used: 10
Residuals looks fine at the start but start increasing in size as the time increase. Ljung test also proves that the residuals are correlated to each other which can also be seen in ACF plot.
Housing Sales Data:-
autoplot(hsales) +
ggtitle("Sales of one-family houses")
The data looks seasonal with a cyclic pattern also present.
train_hsales = window(hsales, end = c(1993,12))
test_hsales = window(hsales, start = 1994)
fc1 = meanf(train_hsales)
fc2 = naive(train_hsales)
fc3 = snaive(train_hsales)
accuracy(fc1, test_hsales)
## ME RMSE MAE MPE MAPE MASE
## Training set 2.480763e-15 12.13880 9.498898 -6.120182 20.30851 1.119163
## Test set 6.451587e+00 10.00315 7.841270 9.521209 12.60939 0.923861
## ACF1 Theil's U
## Training set 0.8661515 NA
## Test set 0.2475017 1.074061
accuracy(fc2, test_hsales)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01593625 6.289813 4.988048 -0.7800232 9.880157 0.5876934
## Test set 7.40000000 10.639549 8.600000 11.1730634 13.839730 1.0132548
## ACF1 Theil's U
## Training set 0.1829708 NA
## Test set 0.2475017 1.15566
accuracy(fc3, test_hsales)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1375000 10.576113 8.4875 -2.1016380 17.63375 1.0000000
## Test set 0.3043478 6.160886 5.0000 -0.7312374 9.12828 0.5891016
## ACF1 Theil's U
## Training set 0.838108 NA
## Test set 0.224307 0.8031005
As expected, seasonal naive method gives the best results as it takes care of the seasonality present in the data.
checkresiduals(fc3)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 684.66, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
We can see that the residuals showing a cyclic pattern that is because the cyclic pattern still exist in the time series. Residuals are bimodel and negatively skewed. Residuals are also correlated as can be seen the ACF plot.